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Abstract 

The multi-domain non-structural protein 3 (Nsp3) is the largest protein 
encoded by the coronavirus (CoV) genome, with an average molecular mass 
of about 200 kD. Nsp3 is an essential component of the 
replication/transcription complex. It comprises various domains, the 
organization of which differs between CoV genera, due to duplication or 
absence of some domains. However, eight domains of Nsp3 exist in all known 
CoVs: the ubiquitin-like domain 1 (Ubll), the Glu-rich acidic domain (also 
called “hypervariable region”), a macrodomain (also named “X domain”), the 
ubiquitin-like domain 2 (Ubl2), the papain-like protease 2 (PL2 pro ), the Nsp3 
ectodomain (3Ecto, also called “zinc finger domain”), as well as the domains 
Y1 and CoV-Y of unknown functions. In addition, the two transmembrane 
regions, TM1 and TM2, exist in all CoVs. The three-dimensional structures of 
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domains in the N-terminal two thirds of Nsp3 have been investigated by X-ray 
crystallography and/or nuclear magnetic resonance (NMR) spectroscopy since 
the outbreaks of Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) 
in 2003 as well as Middle-East Respiratory Syndrome coronavirus 
(MERS-CoV) in 2012. In this review, the structures and functions of these 
domains of Nsp3 are discussed in depth. This article is part of the series “From 
SARS to MERS: Research on highly pathogenic human coronaviruses” 
(Hilgenfeld & Peiris, Antiviral Res. 100, 286-295 (2013)). 

Keywords: ubiquitin-like domain; papain-like protease; macrodomain; 
nucleic-acid binding domain; innate immunity; structural biology 

Abbreviations: GST, glutathione S-transferase; IRF, interferon regulatory 
factor; NF-kB, nuclear factor kappa-light-chain-enhancer of activated B cells; 
TAB3, TGF-beta-activated kinase 1 and MAP3K7-binding protein 3; Bbc3, 
Bcl-2-binding component 3; TRAF, TNF receptor-associated factor; RIG-1, 
retinoic acid-inducible gene I; STING, stimulator of interferon genes; TBK1, 
TANK-binding kinase 1; MDM2, mouse double minute 2 homolog; RCHY1, 
RING finger and CHY zinc finger domain-containing protein 1; PAIP1: 
poly(A)-binding protein-interacting protein 1; MKRN: makorin ring finger 
protein. 
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1, Introduction 

This review of published research on the coronavirus non-structural protein 3 
(Nsp3) forms part of a series in Antiviral Research on “From SARS to MERS: 
research on highly pathogenic human coronaviruses.” (Hilgenfeld & Peiris, 

2013). Two excellent earlier papers dealt with aspects of Nsp3. The first 
described the state of knowledge of the papain-like protease (PL pro ) 
(Baez-Santos et al., 2015), while the second adopted a bioinformatics 
viewpoint when describing Nsp3 and other non-structural proteins involved in 
anchoring the coronavirus replication/transcription complex (RTC) to modified 
membranous structures originating from the endoplasmic reticulum (ER) 
(Neuman, 2016). We build on these fine reviews, focusing on recent results 
and discussing the structures and functions of the individual Nsp3 domains in 
sequential order. 

Coronavirus (CoV) is a member of the subfamily Coronavirinae within the 
family Coronaviridae of the order Nidovirales. It is the enveloped 
positive-sense single-stranded RNA (+ssRNA) virus with the largest genome 
of all known RNA viruses thus far (Brian & Baric, 2005; Gorbalenya et al., 
2006). The genomes of different CoVs comprise between 26 and 32 kilobases; 
however, the overall organization of the genomes is similar. The 5'-terminal 
two thirds of the genome include two open reading frames (ORFs), la and 1b, 
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that together encode all non-structural proteins for the formation of the RTC, 
whereas the 3'-proximal third encodes the structural and accessory proteins 
(Fig. 1A; Brian & Baric, 2005). ORFIa encodes polyprotein (pp) la containing 
Nspl-11, while ORFIa and ORFIb together produce pplab containing 
Nspl-16 through a (-1) ribosomal frameshift overreading the stop codon of 
ORFIa (Fig. 1A; Brierley et al., 1989). Coronaviruses are divided into four 
genara: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and 
Deltacoronavirus (Adams & Carstens, 2012). CoVs can infect many species 
(Fehr & Perlman, 2015); currently, the coronaviruses infecting humans are all 
from the genera alpha-CoV or beta-GoN. HCoV 229E and HCoV NL63 belong 
to the former (Tyrrell & Bynoe, 1965; van der Hoek et al., 2004), whereas 
HCoV OC43, HKU1, SARS-CoV, and MERS-CoV belong to the latter genus 
(Hamre & Procknow, 1966; Woo et al., 2005; Drosten et al., 2003; Ksiazek et 
al., 2003; Kuiken et al., 2003; Peiris et al., 2003; Zaki et al., 2012). 
Furthermore, HCoV OC43 and HKU1 belong to clade A of beta-CoV, while the 
two highly pathogenic human CoVs, SARS-CoV and MERS-CoV, are from 
clades B and C, respectively. 

Nsp3 is the largest multi-domain protein produced by coronaviruses (Fig. 1A). 
It features a somewhat different domain organization in different CoV genera. 
The individual coronaviruses can possess 10 to 16 domains of which eight 


4 




ACCEPTED MANUSCRIPT 


domains and two transmembrane regions are conserved, according to a 
recent bioinformatic analysis (Neuman, 2016). The domain organization of 
Nsp3 from HCoV NL63 as a representative of alpha-CoVs, and from 
SARS-CoV in clade B of the genus beta -CoV are displayed in Fig. 1 A. Nsp3 is 
released from ppla/lab by the papain-like protease domain(s), which is (are) 
part of Nsp3 itself (Fig. 1 A; Ziebuhr et al., 2000). Nsp3 plays many roles in the 
viral life cycle (Fig. 1B). It can act as a scaffold protein to interact with itself and 
to bind other viral Nsps or host proteins (von Brunn et al., 2007; Pan et al., 
2008; Imbert et al., 2008; Pfefferle et al., 2011; Ma-Lauer et al., 2016). In 
particular, Nsp3 is essential for RTC formation (van Hemert et al., 2008; 
Angelini et al., 2013). The RTC is associated with modified host ER 
membranes that produce convoluted membranes (CMs) and 
double-membrane vesicles (DMVs) in SARS-CoV-, MHV (mouse hepatitis 
virus)- as well as MERS-CoV-infected cells (Snijder et al., 2006; Knoops et al., 
2008; Hagemeijer et al., 2011; de Wilde et al., 2013). Nsp3 and Nsp5 were 
detected on the CMs in SARS-CoV-infected cells by immunogold electron 
microscopy (Knoops et al., 2008). Co-expression of Nsp3, Nsp4, and Nsp6 
can induce DMV formation in SARS-CoV-infected cells but the same result 
was not observed when Nsp3 lacking its C-terminal third (residues 1319-1922) 
was co-expressed with Nsp4 and Nsp6 (Angelini et al., 2013). Correspondingly, 
co-expression of only the C-terminal third of Nsp3 (residues 1256-1922) and 
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Nsp4 induces the occurrence of the zippered ER and membrane curvature in 
SARS-CoV- or MHV-infected cells, which is likely to enhance DMV formation 
(Hagemeijer et al., 2014). Above all, Nsp3 is a key component for coronavirus 
replication; however, many functions of Nsp3 remain to be investigated. In this 
review, the current knowledge on the structures and functions of the individual 
Nsp3 domains is summarized and discussed. 

2. Ubiquitin-like domain 1 and the Glu-rich acidic region 

The ubiquitin-like domain 1 (Ubll) and the Glu-rich acidic region are located at 
the N-terminus of Nsp3. These two regions together are also named “Nsp3a” 
(Neuman et al., 2008). Nsp3a exists in all CoVs in spite of no more than 15% 
amino-acid sequence identity between the domains in CoVs from different 
genera. 

Two Ubll structures from betacoronaviruses of different clades have been 
determined by NMR spectroscopy so far (Table 1); one is from SARS-CoV in 
clade B (Serrano et al., 2007) and the other from MHV in clade A (Keane and 
Giedroc, 2013). In SARS-CoV, the Ubll comprises residues 1-112; the core 
residues 20-108 form a typical ubiquitin-like fold with secondary-structure 
elements in the following order: |31 —a 1 —132—a2—r| 1 —a3— (33—(34 (q: 3i 0 helix; Fig. 
2A; Serrano et al., 2007); residues outside this core are flexible. The 
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well-defined structure of MHV Ubll (residues 19-114) with the 
secondary-structure elements (31-a1-(32-a2-a3-(33-(34 is similar to that of 
SARS-CoV Ubll (Fig. 2B), with a root-mean-square deviation (R.M.S.D.) of 
2.8 A (for 85 out of 95 Ca atoms; Z-score: 7.4) according to the Dali server 
(Holm & Rosenstrom, 2010). A structural difference between the two Ubll 
domains is that the two disjoined helices n1’C(3 in SARS-CoV Ubll are 
replaced by one long continuous helix (a3) in MHV Ubll (Fig. 2A and B). 

The known functional roles of Ubll in CoVs are related to ssRNA binding and 
interacting with the nucleocapsid (N) protein (Fig. IB; Serrano et al., 2007; 
Hurst et al., 2010, 2013). The Ubll of SARS-CoV binds single-stranded RNA 
(ssRNA) containing AUA patterns. Surprisingly, many negatively charged 
regions (such as the 3i 0 helix, ql) show obvious conformational changes in the 
NMR spectra when RNA is added to the protein solution (Serrano et al., 2007), 
indicating that RNA binding has long-range effects on the protein conformation. 
In view of the presence of several AUA repeats in the 5'-untranslated region 
(UTR) of the SARS-CoV genome, the Ubll likely binds to this region. 

In MHV, the Ubll domain efficiently binds the cognate nucleocapsid (N) protein; 
thus it seems to be important for virus replication as well as initiation of viral 
infection. There is a critical relationship between Nsp3 interaction with the N 
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protein and infectivity, as this interaction serves to tether the viral genome to 
the newly translated RTC at an early stage of coronavirus infection (Hurst et al., 
2010, 2013). Deletion of the Ubll core (residues 19-111) of MHV abrogates 
viral replication (Hurst et al., 2013). The major interface regions of the complex 
Ubll —N involve acidic residues of Ubll helix a2 and the serine- and 
arginine-rich region (SR-rich region) of the N protein, as shown by NMR 
titration experiments (Keane & Giedroc, 2013). However, the acidic residues in 
helix o2 are not absolutely conserved among different CoVs, implying that the 
details of the interactions between Ubll and N protein will not be the same. In 
addition, the binding affinity between the bovine coronavirus (BCoV) N 
(residues 57-216) and MHV Ubll is about 260-fold lower compared to MHV N 
(residues 60-219) and its cognate Ubll (Keane & Giedroc, 2013). A structure 
of the Ubll —N complex would help understand why non-cognate Ubll and N 
protein bind weakly to each other. Thus far, only a computer docking model of 
the MHV Ubll —N complex was reported (Tatar & Tok, 2016). This model 
proposes that residues of |31, al, the loop between (31 and al, |33, and (34 of 
MHV Ubll interact with the N-terminal domain (NTD) as well as the SR-rich 
region of the N protein. Differently from what was suggested above, most 
acidic residues of Ubll helix a2 do not interact with the SR-rich region of N in 
the docking model (Tatar & Tok, 2016). 
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The interaction between the N protein and nucleic acid is essential for CoV 
genome transcription (Chang et al., 2014). The NTD plus the SR-rich region 
(residues 60-219) of MHV N play an important role in interacting with 
transcriptional regulatory sequence (TRS) RNA (Grossoehme et al., 2009). 
The N-TRS RNA complex prevents the formation of the Ubll-N complex 
(Keane & Giedroc, 2013). The competition between N protein binding to either 
the TRS or the Ubll might be connected to the switch between viral 
transcription and replication. It has been shown that the SR region of N protein 
can be phosphorylated (Peng et al., 2008). Each of two phosphomimetic 
substitutions of serine residues predicted to be phosphorylated (S207D and 
S218D) in the SR region of MHV N decreases the binding affinity to Ubll by 
about 3-fold, compared to wild-type N (Keane & Giedroc, 2013). 

The overall structure of the SARS-CoV Ubll domain is similar to human 
ubiquitin (Ub) and that of each of the two ubiquitin-like domains of human or 
mouse interferon-stimulated gene 15 (ISG15) (Fig. 2D and E; Vijay-Kumar et 
al., 1987; Narasimhan et al., 2005; Daczkowski et al., 2017). In human Ub as 
well as in the ISG15s, only a short 3i 0 helix is found at the position of r)1-a3 or 
a3 in Ubll of SARS-CoV or MHV (Fig. 2D and E). Ub and ISG15 are important 
for innate antiviral immunity (Heaton et al., 2016; Morales & Lenschow, 2013); 
therefore, viruses tend to not only inhibit the conjugation of Ub or ISG15 to 
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targets but also remove Ub or ISG15 from ubiquitinated or ISGylated proteins, 
respectively (Yuan & Krug, 2001; Bakshi et al., 2013; Yang et al., 2014). Thus, 
in CoVs, one or two papain-like protease (PL pro ) domain(s) within Nsp3 
possess deubiquitinating (DUB) and delSGylating activities (see below; for a 
recent review on the role of viral proteases in counteracting the host-cell's 
innate immune system, see Lei & Hilgenfeld (2017)). Interestingly, two 
ubiquitin-like domains (Ubll and Ubl2) exist in all CoVs (see below; Neuman 
2016). Considering that ubiquitin-like modules are often involved in 
protein-protein interactions to regulate various biological processes 
(Hochstrasser, 2009), such as the MHV Ubll —N interaction mentioned above, 
a novel possible function of Ub-like domains in CoVs might be the interaction 
with target proteins of Ub (or ISG15) by mimicking the shape of these two 
molecules. The purpose of such mimicry could be to somehow interfere with 
pathways involving ubiquitinylated or ISGylated host targets, thereby leading 
to disruption of host anti-viral signal transduction or protein degradation. 

The Ubll of SARS-CoV is also similar to the Ras-interacting domain (RID) of 
RaIGDS (Ral guanine nucleotide dissociation stimulator; Fig. 2F; Serrano et al., 
2007). Ras regulates cell-cycle progression via binding to the RID of 
Ras-interacting proteins (Hofer et al., 1994; Huang et al., 1998; Coleman et al., 
2004). By mimicking the RID, the Ubll might interrupt the interactions between 
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Ras and its effectors, thus affecting the cell cycle to support virus replication. In 
agreement with this, it is known that both MHV and SARS-CoV induce 
cell-cycle arrest in the G 0 /Gi phase (Chen & Makino, 2004; Yuan et al., 2005). 

Following the Ubll, the second subdomain of Nsp3a in CoVs is the Glu-rich 
acidic region. It comprises residues 113-183 of SARS-CoV Nsp3, with more 
than 35% Glu and 10% Asp (Serrano et al., 2007). Because of the 
non-conserved amino-acid sequence, this region is also designated as 
“hypervariable region (HVR)” (Neuman, 2016). The HVR region is intrinsically 
disordered in SARS-CoV and in MHV (Serrano et al., 2007; Keane & Giedroc, 
2013) and does not affect the conformation of the globular Ubll domain in 
SARS-CoV (Serrano et al., 2007). Currently, the function of HVR in CoVs is 
unknown. Glu/Asp-rich proteins are often involved in many biological roles, 
such as DNA/RNA mimicry, metal-ion binding, and protein-protein interactions 
(Chou & Wang, 2015). The Ubll+HVR region has been demonstrated via a 
yeast-two-hybrid (Y2H) assay to interact with SARS-CoV Nsp6, whereas a 
GST pull-down study identified Nsp8, Nsp9, and NAB-pSM-TMl of Nsp3 (NAB: 
nucleic-acid binding domain; (3SM: betacoronavirus-specific marker; TM1: 
transmembrane region 1; see below) as binding partners (Imbert et al., 2008). 
Does the HVR play any role in these protein-protein interactions? This 
question is yet to be answered. Furthermore, the acidic region is dispensable 
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for MHV replication (Hurst et al., 2013). On the other hand, this region does 
exist in all CoVs. It is conceivable that it may have regulatory rather than 
essential roles in the coronavirus replication process. However, the exact 
role(s) of the acidic region in CoVs should be further investigated. 

3. Papain-like protease 1 domain 

The papain-like protease domain(s) is/are responsible for releasing Nspl, 
Nsp2, and Nsp3 from the N-terminal region of polyproteins 1 a/lab in CoVs 
(Harcourt et al., 2004; Barretto et al., 2005). The papain-like protease 1 
domain (PL1 pro ) follows the HVR region (see Fig. 1 A) in the alpha-CoVs and in 
clade A of beta-CoVs (Graham & Denison, 2006; Ziebuhr et al., 2001; Chen et 
al., 2007; Wojdyla et al., 2010; Neuman, 2016). Interestingly, the PL1 pro is not 
complete in the gamma-CoV infectious bronchitis virus (IBV; Ziebuhr et al., 
2001) and in Hipposideros pratti bat CoV, a virus relating to clade B of the 
beta-CoVs (Genebank code NC 025217.1 ; Neuman, 2016). In these latter 
viruses, some parts (such as the zinc-finger motif; see below) and the residues 
of the catalytic triad of the PL1 pro s are missing. Furthermore, the PL1 pro is 
totally absent in beta-CoV clades B, C, and D as well as in delta-CoVs. Both 
the two highly human-pathogenic SARS-CoV (Fig. 1A) and MERS-CoV thus 
do not have a PL1 pro domain; they only possess the other papain-like protease, 
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the PL2 pro domain that is conserved in all coronaviruses (see below). It is still 
not clear why certain CoVs encode two PL pro s. 

Thus far, only one structure of a PL1 pro domain has been determined, that from 
the alpha-CoV Transmissible Gastroenteritis Virus (TGEV) (Table 1; Wojdyla 
et al., 2010). The PL1 pro resembles an extended right-hand scaffold with thumb, 
palm, and fingers subdomains (Fig. 3). It contains a zinc-finger in the fingers 
subdomain as well as a catalytic triad, Cys32-His183-Asp196. A canonical 
oxyanion hole as known from papain (Menard et al., 1991) is present in TGEV 
PL1 pro , with the main-chain amide of the catalytic cysteine residue and the 
side-chain of a glutamine residue (Gln27) 5 residues N-terminal to the cysteine 
contributing to the stabilization of the oxyanion transition state of peptide 
hydrolysis (Fig. 3; Wojdyla et al., 2010). The fold of the PL1 pro is similar to that 
of the PL2 pro of SARS-CoV (see below; R.M.S.D. 3.1 A, for 202 out of 211 Ca 
atoms; Dali Z-score = 18.4) and MERS-CoV (R.M.S.D. 3.1 A, for 198 out of 
211 Ca atoms; Z-score = 18.7) as well as to several human ubiquitin-specific 
proteases (USPs, such as USP 2, 7, 14, 21 etc., Z-scores from 11.2 to 12.6) 
(Ratia et al., 2006; Wojdyla et al., 2010; Lei et al., 2014). The biggest 
difference between these PL pro domains is found in the zinc-finger regions, 
which are obviously flexible (Wojdyla et al., 2010; Lei et al., 2014). 
Furthermore, the electrostatic surface potential of TGEV PL1 pro features two 
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negative patches which are absent in SARS-CoV PL2 pro (Wojdyla et al., 2010). 
One patch is located at the opposite side of the active site, between the thumb 
and palm subdomains, and the other is near the active-site groove and the 
surrounding region, between the thumb and fingers subdomains (Wojdyla et 
al., 2010). The latter patch is related to the substrate binding and specificity of 
TGEV PL1 pro (Wojdyla et al., 2010). 

The PL1 pro of TGEV has been demonstrated to process the cleavage site 
Nsp2-l-3 (i: cleavage site) and to exhibit DUB activity to remove ubiquitin from 
Lys48-/Lys63-linked Ub chains in vitro (Putics et al., 2006; Wojdyla et al., 
2010). The P4-P1 residues of the cleavage site between Nsp2 and 3 are 
Lys-Met-Gly-Gly in TGEV (Table 2), while the last four residues of ubiquitin 
are Leu-Arg-Gly-Gly. Therefore, the S4 pocket of TGEV PL1 pro should be 
able to accommodate residues as different as Lys and Leu. In contrast, the 
P4-P1 residues in the polyprotein substrates of PL2 pro are Leu-Xaa-Gly-Gly 
(Xaa is Asn or Lys) in SARS-CoV (Harcourt et al., 2004; Barretto et al., 2005). 
P4 is the same residue as in the ubiquitin substrate; thus, the corresponding 
pocket of SARS-CoV PL2 pro is tailor-made for leucine. Residues llel 55, 
Tyr175, and Thr209 form the S4 subsite in TGEV PL1 pro (Fig. 3; Wojdyla et al., 
2010), whereas the corresponding residues in SARS-CoV PL2 pro are Pro249, 
Tyr265, and Thr302 (Ratia et al., 2006). When superimposing the structures, 
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Wojdyla et al. (2010) found that the Ca atom of Mel 55 of TGEV PL1 pro is 3 A 
away from the Ca atom of the corresponding Pro249 in SARS-CoV PL2 pro , 
thereby creating a larger S4 pocket in TGEV PL1 pro , so that it can bind lysine, 
in addition to leucine 

As mentioned above, for reasons unknown so far, many CoVs contain two 
PL pro s. Both PL1 pro and PL2 pro are involved in releasing Nspl, Nsp2, and Nsp3 
in these CoVs. However, the two PL pro s in different CoVs show varying 
substrate specificity. The PL1 pro of MHV cleaves Nspli2 and Nsp2i3, while 
the PL2 pro cleaves Nsp3l4 (Table 2; Bonilla et al., 1997; Kanjanahaluethai & 
Baker, 2000). Human coronavirus NL63 (HCoV-NL63) PL1 pro processes 
Nspl-12 while the PL2 pro processes the other two cleavage sites, Nsp2i3 and 
Nsp3i4 (Table 2; Chen et al., 2007). Both PL1 pro and PL2 pro of HCoV 229E 
can cleave Nspli2 and 2l3 (Table 2); however, the PL1 pro is more efficient in 
cleaving Nspl 12 while the PL2 pro is more efficient with respect to the latter site 
(Ziebuhr et al., 2007). Some viruses, such as SARS-CoV, MERS-CoV, and 
IBV, comprise only one functional PL2 pro to process all three cleavage sites 
(Table 2). The residues (P5-P2') of the three cleavage sites are diversified in 
MHV, HCoV NL63, and HCoV 229E, although the PI is conserved as a small 
residue (Gly or Ala) (Table 2). In contrast, the PI and P2 residues (Gly-Gly or 
Ala-Gly) are absolutely identical in all the cleavage sites of SARS-CoV, 
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MERS-CoV, and IBV; furthermore, the P4-P1 residues are - to a certain 
extent - conserved in each of these three viruses (Table 2). Therefore, the 
presence of two PL pro s with slightly different substrate specificity in some CoVs 
may be required to cleave native substrates that deviate from the uniform ones 
processed by SARS-CoV, MERS-CoV, or IBV PL pro s. Unfortunately, studies 
on the details of recognition of different substrates by PL1 pro and PL2 pro are 
hampered by the fact that no crystal structures of the two enzymes from the 
same virus are available. 

4. Macrodomains and the "Domain Preceding Ubl2 and PL2 pro (DPUP)" 

(i) Macrodomain I (Macl, X domain) 

A conserved macrodomain (also called “X domain”, Nsp3b) follows the HVR or 
the PL1 pro domain in all coronaviruses (Fig. 1A; Gorbalenya et al., 1991; 
Neuman et al., 2008; Neuman, 2016). Macrodomains widely exist in bacteria, 
archaea, and eukaryotes (Han et al., 2011). In addition, these conserved 
domains are also present in several positive-sense ssRNA (+ssRNA) viruses 
of the families Hepeviridae, Togaviridae, and Coronaviridae, such as hepatitis 
E virus (HEV), alphavirus, rubivirus, and all coronaviruses (Koonin et al., 1992; 
Snijder et al., 2003). Our group has shown that the X domain (Macl) is 
dispensable for RNA replication in the context of a SARS-CoV replicon (Kusov 
et al., 2015). Recently, evidence accumulated showing that the X domain plays 
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a role in counteracting the host innate immune response (Eriksson et al., 2008; 
Kuri et al., 2011; Fehr et al., 2015, 2016). 

The first crystal structure of an Nsp3 domain of any coronavirus was the 
unliganded X domain of SARS-CoV (Table 1; Saikatendu et al., 2005). A little 
later, the structure of the SARS-CoV X domain in complex with ADP-ribose 
(ADPr) was determined (Table 1; Egloff et al., 2006). Subsequently, structures 
of the unliganded X domain and/or its complex with ADPr from HCoV 229E, 
IBV, HCoV NL63, Feline CoV (FCoV), and MERS-CoV were reported (Table 1; 
Piotrowski et al., 2009; Xu et al., 2009; Wojdyla et al., 2009; Cho et al., 2016). 
All structures show that the X domain adopts a conserved three-layered a/p/a 
sandwich fold (Fig. 4). The domain with this fold is called a macrodomain 
because of its similarity to the extra domain in the MacroH2A variant of human 
histone 2A (Pehrson & Fried, 1992; Saikatendu et al., 2005). Typically, the X 
domain includes a central (3 sheet with seven (3 strands in the order 
(31-(32-(37-(36-(33-(35-(34, with (31 and (34 being antiparallel to the rest (Fig. 4). 
Only the X domain of IBV is an exception, since it lacks the first strand, (31 
(Piotrowski et al., 2009; Xu et al., 2009). Six helices are located on the two 
sides of this (3 sheet, with helices al, o2, and a3 on one side and a4, a5, and 
a6 on the other (Fig. 4). 
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One function of the conserved macrodomain is the binding of ADP-ribose or 
poly(ADP-ribose) (Han et al., 2011). The binding characteristics are the same 
in most X domains of coronaviruses (Egloff et al., 2006; Xu et al., 2009; 
Wojdyla et al., 2009; Cho et al., 2016). Like Cho et al. (2016), we have 
determined the crystal structure of the MERS-CoV X domain in complex with 
ADP-ribose (ADPr) (PDB entry: 5HOL ; Fig. 4). Our structure and the 
ADPr-binding pattern are almost identical to the structure (PDB entry: 5DUS) 
described by Cho et al. (2016) and the structure of the SARS-CoV X domain in 
complex with ADPr (PDB entry: 2FAV : Egloff et al., 2006). The R.M.S.D. are 
0.4 A (for 165 out of 165 Ca atoms; Z-score: 34.2) and 1.2 A (for 163 out of 
171 Ca atoms; Z-score: 28.3), respectively, according to the Dali server (Holm 
& Rosenstrom, 2010). Here, we describe the structure of the MERS-CoV X 
domain in complex with ADPr from our own laboratory as an example (Fig. 4). 
The ADPr is located in a cleft at the top of the central (3 sheet ((37—(36—133—135). 
Five stretches of amino-acid residues are mainly involved in the binding of 
ADPr: I, Gly20-Ala22; II, Ala37-Asn39; III, Lys43-Ala49 (including a 
“45-GGG-47” triple-glycine motif); IV, Pro124-Phe131; V, Val153-Asn155 (Fig. 
4). The adenine base is in contact with regions I and V. In particular, the 
side-chain of Asp21 accepts a hydrogen bond from the exocyclic NH 2 group in 
position 6 of the adenine, thereby fixing the orientation of the base. This Asp 
residue is conserved in macrodomains from bacteria, archea, and eukaryotes 
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(Saikatendu et al., 2005; Egloff et al., 2006). When the corresponding Asp20 in 
the macrodomain protein AF1521 of Archeoglobus fulgidus was replaced by 
alanine, the ADP-ribose binding affinity was reduced almost 90-fold (Karras et 
al., 2005). The central ribose moiety is located between regions IV and V. The 
02' of ADPr forms a hydrogen bond with a water molecule (H 2 0 308) that is 
stabilized by the side-chain of Asn155 (region V). The two phosphate groups 
accept a total of four hydrogen bonds from Ile48 (region III) and Glyl 29, llel 30 
as well as Phel 31 (region IV). The distal ribose is in contact with regions II and 
III; The 01" and 02" of this ribose form hydrogen bonds with the amides of 
Gly47 and Gly45 (region III), respectively. The 03" forms a hydrogen bond 
with the side-chain amide of Asn39 (region II). Thus, Asp21 and Asn39 appear 
to fix the two ends of the ADP-ribose, thereby stabilizing its binding to the cleft 
(Fig. 4). Surprisingly, the orientation of the corresponding Asp in the 
HCoV-229E X domain is different; this Asp does not directly bind ADP-ribose 
but is in contact with its neighboring residue Thr-22, and not with the N6 atom 
of adenine (Piotrowski et al., 2009; Xu et al., 2009). This difference could 
explain why the binding affinity between the X domain of HCoV 229E and 
ADPr is about 10-fold lower than that of the MERS-CoV homologue 
(Piotrowski et al., 2009; Cho et al., 2016). Interestingly, the X domain from IBV 
strain M41 but not of IBV strain Beaudette can bind ADPr (Xu et al., 2009; 
Piotrowski et al., 2009). The important “Gly-Gly-Gly” motif of the M41 X 
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domain, involved in binding the distal ribose, is mutated to “Gly-Ser-Gly” in 
the Beaudette virus, thus preventing ADPr interaction with the X domain 
(Piotrowski et al., 2009). The virulence of IBV strain Beaudette is attenuated 
compared to that of IBV strain M41 (Geilhausen et al., 1973). It is an 
interesting hypothesis that the loss of the ability to bind ADPr may be one of 
the reasons for the lower pathogenicity of the former IBV. 

Macrodomains of some CoVs have been shown to exhibit a weak 
ADP-ribose-1 "-phosphate phosphatase (ADRP) activity in vitro (kcat ^5-20 
min' 1 ; Saikatendu et al., 2005; Egloff et al., 2006; Putics et al., 2006). The 
residue Asn41 of SARS-CoV (corresponding to the Asn39 in MERS-CoV 
mentioned above) is essential for ADRP activity (Egloff et al., 2006). However, 
the ADRP activity is dispensable for HCoV-229E replication in cell culture 
(Putics et al., 2005). On the other hand, when the ADRP activity of the 
HCoV-229E or that of the SARS-CoV X domain is inactivated through 
replacement of the Asn mentioned above by Ala, mutant viruses exhibit 
increased interferon a (IFN-a) sensitivity (Kuri et al., 2011). Interestingly, the 
corresponding mutants in MHV (strains A59 and JHM) and a mouse-adapted 
SARS-CoV do not show an increased IFN-(3 sensitivity (Eriksson et al., 2008; 
Fehret al., 2015, 2016). 
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Fehr et al. (2016) confirmed that the wild-type X domain of SARS-CoV inhibits 
the expression of innate-immunity genes (such as IFN-p, interleukin 6 (IL-6)) in 
vitro and thereby blocks the host immune response. At variance with this, 
Eriksson et al. (2008) and Fehr et al. (2015) reported that the Asn-to-Ala 
mutation in the MHV (strains A59 and JHM, resp.) X domain reduces the 
production of inflammatory cytokines (e.g., IL-6) in vitro and in vivo. Eriksson et 
al. (2008) hypothesized that the X domain aggravates MHV-induced severe 
liver pathology, likely by inducing the expression of inflammatory cytokines. 
These results suggest that the main function of the X domain may differ in 
different CoVs. On the other hand, the expression level of type-1 IFN (a or p) is 
increased in cells infected with SARS-CoV or MHV carrying the Asn-to-Ala 
mutation in the X domain (Eriksson et al., 2008; Kuri et al., 2011; Fehr et al., 
2016). This indicates that suppression of innate immunity by the X domain may 
be a feature conserved across the coronaviruses. 

Recently, it was demonstrated that macrodomains from several +ssRNA 
viruses (such as HEV, SARS-CoV, HCoV 229E, Venezuelan equine 
encephalitis virus (VEEV), and Chikungunya virus (CHIKV)) act as hydrolases 
removing mono- and/or poly(ADP-ribose) from mono- or poly(ADP-ribosyl)ated 
proteins, activities designated as de-mono-ADP-ribosylation (de-MARylation) 
and de-poly-ADP-ribosylation (de-PARylation), respectively (Li et al., 2016a; 
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Fehr et al., 2016; Eckei et al., 2017; McPherson et al., 2017). The weak ADRP 
activity described for the X domain in the literature is most probably just a 
non-physiological side reaction of de-MARylation and/or de-PARylation. 

The ADP-ribosylation (MARylation or PARylation) of proteins is a reversible 
posttranslational modification involved in various cellular processes (Aravind et 
al., 2015; Liu & Yu, 2015). Poly(ADP-ribose) polymerases (PARPs, also 
named ARTDs, ADP-ribosyltransferases diphtheria toxin-like) are responsible 
for transfering mono- or poly(ADP-ribose) to target proteins (Liu & Yu, 2015). 
For example, PARP7 (ARTD14), PARP10 (ARTD10), PARP12 (ARTD12), and 
PARP14 (ARTD8) add mono-ADPr to other proteins and themselves 
(Butepage et al., 2015), while PARP1 (ARTD1) and PARP2 (ARTD2) add 
poly-(ADPr)s (Gibson & Kraus, 2012). Various amino-acid residues have been 
identified as acceptor sites for ADP-ribosylation; this still seems to be a matter 
of some debate. Arg and Ser have certainly been shown to accept ADPr(s) 
(Laing et al., 2011; Leidecker et al., 2016), but the acidic residues are also 
thought to be important sites of ADP-ribosylation (Feijs et al., 2013). PARP7, 
10, and 12 can act as type-1 IFN-stimulated genes (ISGs) and inhibit VEEV 
replication (Atasheva et al., 2014). Also, Verheugd et al. (2013) reported that 
PARP10 can block the NF-kB pathway via MARylation of NEMO ("NF-kB 
essential modulator"). Moreover, the mRNA and protein synthesis of PARP14 
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(ARTD8) and PARP10 are stimulated by IFN-a in vivo (Eckei et al., 2017). 
Therefore, some PARPs play a role in the host immune defense. Recently, it 
has been demonstrated that the X domains of SARS-CoV and HCoV 229E 
possess the ability to de-MARylate the ADP-ribosylated PARP10 catalytic 
domain in vitro (Fehr et al., 2016; Li et al., 2016a). Flowever, the relationship 
between the de-MARylation function of viral macrodomains and their 
anti-innate immunity activity is still unclear. The de-MARylation activity is a 
common feature of the X domain (i.e., the first of the macrodomains if there is 
more than one) of all investigated macrodomain-encoding viruses (Li et al., 
2016a; Eckei et al., 2017). Interestingly, the macrodomains of VEEV and 
SARS-CoV can also remove the entire PAR chain from PARylated PARP5a, 
PARP1, and PARP3 (ARTD3), without releasing free monomeric ADPr (Li et 
al., 2016a). Therefore, the macrodomains of these two viruses hydrolyze the 
amino acid-ADPr ester bond but not ribose-ribosyl glycosidic bonds in PAR 
chains. A similar observation was also made for the macrodomain of CHIKV, 
although the de-PARylation of PARylated PARP1 was weak (Eckei et al., 
2017). Currently it is unknown whether the de-PARylation activity of 
macrodomains plays any role in the coronavirus life cycle. 

The conserved Asn42 residue, the triple-glycine 48-GGG-50 motif, and Glyl 23 
of the HEV macrodomain (corresponding to Asn39, 45-GGG-47, and Glyl 29 
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of the MERS-CoV X domain mentioned above) are essential for the 
de-MARylation activity (Li et al., 2016a). This is not surprising because they 
are involved in binding ADP-ribose. A putative mechanism for the 
de-MARylation activity of the VEEV macrodomain has been proposed (Li et al., 
2016a). It is assumed that a water molecule performs a nucleophilic attack 
onto the Cl" atom of the mono(ADP-ribose). An equivalent water molecule 
(H 2 0 310) also exists in our structure of the MERS-CoV X domain-ADPr 
complex (Fig. 4). 

Interestingly, the neighboring helicase domain of HEV can increase the 
de-PARylation activity of the macrodomain by about 11-fold but not the 
de-MARylation activity, perhaps because the helicase can support binding of 
the PAR chain (Li et al., 2016a). This observation raises the question whether 
a similar phenomenon exists in CoVs? Should the neighboring domains 
indeed have an influence on the de-MARylation/de-PARylation activities of the 
CoV X domain, this effect should differ between the various viruses, as there is 
little conservation of the neighboring regions. In addition, other CoV Nsps have 
ben demonstrated to interact with the X domain. Using a GST pull-down assay, 
the X domain of SARS-CoV has been shown to bind the RNA-dependent RNA 
polymerase, Nsp12 (Imbert et al., 2008). If this interaction does exist in the 
virus life cycle, is it possible that the two proteins affect the enzymatic activity 
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of each other? Although many three-dimensional structures of CoV 
macrodomains have been determined, more efforts should be made to study 
the biological functions of this domain. 

(II) Macrodomains II and III, and the DPUP (SUD-N, SUD-M, SUD-C) 

Within Nsp3, a non-conserved region follows the X domain (or Mac 1). When 
the first SARS-CoV genome sequences were analyzed, this region was 
recognized as a unique domain only existing in SARS-CoV and therefore 
called “SARS-unique domain” (SUD) (Snijder et al., 2003). An alternative 
name is “Nsp3c” (Neuman et al., 2008). The three-dimensional structure of this 
region has been determined by X-ray crystallography and NMR spectroscopy 
(Table 1; Tan et al., 2009; Chatterjee et al., 2009; Johnson et al., 2010). This 
region includes three distinct subdomains: two macrodomains and one 
frataxin-like fold (Fig. 5A-C). The three subdomains were named SUD-N, 
SUD-M, and SUD-C, indicating the N-terminal, the middle, and the C-terminal 
region of SUD, respectively. A region corresponding to parts of SUD was 
found to exist in other coronaviruses, mostly of clades B, C, and D of the genus 
Betacoronavirus (Neuman, 2016). For example, domains similar to SUD-M 
and SUD-C (but not SUD-N) are also encoded by the MERS-CoV genome 
(Kusov et al., 2015; Ma-Lauer et al., 2016). Thus, it is no longer appropriate to 
call this domain "SARS-unique". Recently, the Nsp3 of MFIV was shown by 
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X-ray crystallography to contain a SUD-C-like fold (Chen et al., 2015). These 
authors renamed this region into "Domain Preceding Ubl2 and PL2 pro " (DPUP). 
In this review, we follow the nomenclature proposed by Chen et al. (2015) and 
Neuman (2016), and use the designations macrodomain II (Mac2), 
macrodomain III (Mac3), and Domain Preceding Ubl2 and PL2 pro (DPUP) for 
SUD-N, SUD-M, and SUD-C, respectively. 

Mac2 (SUD-N) has been shown to be dispensable for the SARS-CoV 
replication/transcription complex within the context of a SARS-CoV replicon, 
but surprisingly, Mac3 (SUD-M) is essential, even though it is not conserved 
throughout the coronaviruses (Kusov et al., 2015). Mac2 and Mac3 each 
display a typical a/(B/a macrodomain fold (Fig. 5A and B). The central (B sheet 
with six (3 strands in the order (B1-(36-(35-(32-(34-(33 is flanked by two (or three) 
helices on either side. Only the last strand, (33, is antiparallel to the other 
strands. Interestingly, Mac2 and Mac3 have the same number of (3 strands in 
the central (3 sheet as the X domain of IBV (see above for X domain of IBV). 
The R.M.S.D. values are 2.5 A - 2.6 A (for 119/171 Ca atoms) between Mac2, 
Mac3, and the X domain of SARS-CoV, according to the Dali server (Holm & 
Rosenstrom, 2010). The corresponding values are 2.6 A - 2.7 A (for 120/165 
Ca atoms) when comparing SARS-CoV Mac2 and Mac3 with the X domain of 
IBV. Although the X-domain and Mac2/3 share the same fold, the sequence 
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identity among them is only about 11% (Tan et al., 2009). All the residues 
important for binding ADP-ribose and for de-MARylation/de-PARylation activity 
(such as the Asn residue and the “GGG” triple-glycine motif interacting with the 
distal ribose, as mentioned above) are not conserved in Mac2/3; therefore 
Mac2/3 cannot bind ADP-ribose (Tan et al., 2009; Chatterjee et al., 2009). 

Currently, most known functions of Mac2/3 are connected with RNA binding. 
Mac2-3 (SUD-NM) preferentially binds oligo(G), which can form 
G-quadruplexes; as expected for these structural modules, the binding affinity 
is enhanced by potassium ions (Tan et al., 2007, 2009). According to a 
mutational study, two positively charged lysine-patches of Mac2 are involved 
in oligo(G) binding, i.e. Lys476+Lys477 (in the loop between a3 and (35; 
residue numbering starts at N-terminus of Nsp3) and Lys505+Lys506 (at the 
end of a4), while the residues Lys563+Lys565+Lys568 (+Glu571) of Mac3 
(located between a2 and (33) are absolutely essential for binding (Fig. 5B; Tan 
et al., 2009). Moreover, working with the SARS-CoV replicon, our laboratory 
has shown that mutation of the same lysine patch of Mac3 in the context of the 
replicon completely abolished SARS-CoV replication, indicating that binding of 
G-quadruplex RNA could be an essential element of RTC activity (Kusov et al., 
2015). Also, Mac3 can bind (GGGA) 2 and (GGGA) 5 as well as (GGGA) 2 GG 
(Johnson et al., 2010). In contrast, Mac3-DPUP (SUD-MC; DPUP: SUD-C, 
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see below) only binds (GGGA) 2 GG but not (GGGA) 2 or (GGGA) 5 . A 3'-terminal 
G nucleotide is apparently important for binding to Mac3-DPUP (Johnson et 
al., 2010). These data indicate that the DPUP subdomain may fine-tune the 
specificity of RNA binding by Mac3 (Johnson et al., 2010). 

The SARS-CoV genome contains three G 6 -stretches and two G 5 -stretches 
(Tan et al., 2009; Johnson et al., 2010), but none of them is conserved in all 
SARS-CoV strains. However, two GGGAGGGUAGG nucleotide segments, 
located in the Nsp2 and Nsp12 coding sequences, are highly conserved in 
various SARS-CoV strains (Johnson et al., 2010). These two nucleotide 
segments differ by only one base from the sequence favored by Mac3-DPUP, 
(GGGA) 2 GG. Johnson et al. (2010) therefore proposed that these two 
sequences could be potential physiological substrates of Mac3-DPUP. 
Besides specific elements in the genome of SARS-CoV, Mac2-3 might bind 
G-rich stretches in host mRNAs. In fact, Mac2-3 prefers to bind longer 
G-stretches, such as (G)i 0 to (G)i 4 (Tan et al., 2007). Such long G-stretches 
exist in several 3’ non-translated regions of host mRNAs, such as the NF-kB 
signaling pathway-related protein TAB3 mRNA and apoptotic signaling 
pathway protein Bbc3 mRNA (Tan et al., 2007, 2009). Mac2-3 may regulate 
the expression of these genes by binding to the poly(G) stretches in the 
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corresponding mRNAs, thereby leading to disruption of the host antiviral 
response as well as of apoptotic signals. 

Mac3 has also been reported to bind oligo(A) (Chatterjee et al., 2009; Johnson 
et al., 2010). This observation (which is not in agreement with the results 
reported by Tan et al. (2007, 2009)) might suggest that Mac3 binds the poly(A) 
tail of the viral genome, or of subgenomic mRNAs, or of host mRNA. 
Poly(A)-binding protein (PABP) binds the genomic poly(A) tails of BCoV 
(bovine coronavirus), MHV, and TGEV, thereby enhancing the replication of 
these viruses (Spagnolo & Hogue, 2000; Galan et al., 2009). Is it possible that 
Mac3 binding to oligo(A) competes with the binding between PABP and the 
poly(A) tail? The question is yet to be answered. 

Besides binding to nucleic acids, Mac2-3 of SARS-CoV has been shown to 
interact directly with host proteins, e.g. the E3 ubiquitin ligase RCHY1 
(Ma-Lauer et al., 2016). RCHY1 and several other host proteins, Paipl, 
MKRN2, and MKRN3 etc. were reported to interact with Nsp3 (Pfefferle et al., 
2011). However, the detailed binding region(s) on Nsp3 have not been 
identified. Ma-Lauer et al. (2016) demonstrated that Mac2-3 and the PL2 pro of 
Nsp3 bind RCHY1, thus resulting in down-regulation of the antiviral protein p53 
(see below; Ma-Lauer et al., 2016). It is an interesting hypothesis that such 
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interactions, which are absent from other CoVs because they lack Mac2-3, 
might account for a unique pathogenicity-related pathway utilized by 
SARS-CoV. 

The DPUP (SUD-C) follows the Mac3 domain in SARS-CoV (Fig. 1 A). Deletion 
of the domain within the context of a SARS-CoV replicon leads to a large 
reduction of RNA synthesis, but some basal RTC activity remains, indicating 
that the DPUP is not absolutely essential for replication (Kusov et al., 2015). 
Currently, three DPUP structures are available, one each from SARS-CoV and 
MHV (Table 1; Fig. 5C and D; Johnson et al., 2010; Chen et al., 2015), and the 
third one from bat coronavirus FIKU9 (Table 1; Hammond et al., 2017). All 
DPUPs adopt a similar topology and overall structure. The R.M.S.D values 
between SARS-CoV DPUP and that of MHV or HKU9 are 2.1 A (for 62 out of 
74 Ca atoms; Z-score: 7.1) or 2.0 A (for 62 out of 77 Ca atoms; Z-score: 7.0), 
respectively, according to the Dali server (Holm & Rosenstrdm, 2010). The 
DPUP consists of an anti-parallel |3 sheet with two a helices located N- and C- 
terminal to this (3 sheet (Johnson et al., 2010; Chen et al., 2015). The two a 
helices form one plane while the (3 sheet forms the other; this resembles a 
typical frataxin-like fold (Bencze et al., 2006). Proteins featuring the 
frataxin-like fold are commonly involved in controling cellular oxidative stress 
by binding iron to maintain the iron homeostasis (Bencze et al., 2006). In case 
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of the yeast frataxin homologue Yfhl, cells lacking this gene were 
demonstrated to be highly sensitive to H 2 0 2 and elevated metal ion levels 
(such as iron and copper) (Foury & Cazzalini, 1997). Several Glu and Asp 
residues in the N-terminal a helix of Yfhl are possibly involved in binding metal 
ions (Fig. 5E; He et al., 2004; Bencze et al., 2006). Interestingly, “EEXXXE” 
and “DDD” motifs exist in the first helix of the SARS-CoV and MHV DPUP, 
respectively, even though the sequence identity of DPUP is only 13% between 
these two viruses. Neuman et al. (2008) found that SARS-CoV 
Mac2-Mac3-DPUP can bind cobalt ions, while Mac3 alone and Mac2*-Mac3 
(2*: C-terminal half of Mac2) cannot (Neuman et al., 2008). According to these 
observations, it is conceivable that the DPUP region binds metal ions. 
Furthermore, infection with SARS-CoV can induce transcription of oxygen 
stress-related genes of the host (Hu et al., 2012). Any involvement of DPUP in 
this biological process is speculative at this time. 

The Mac2-3-DPUP oligodomain (SUD) has been shown to interact with Nsp9, 
Nsp12, and NAB-|3SM-TM1 (see below) of Nsp3 by using a GST pull-down 
assay (Imbert et al., 2008). Using Y2H and co-immunoprecipitation (ColP) 
assays, the oligoprotein Ubll-HVR-Macl-2-3* (3*, N-terminal third of Mac3) 
of SARS-CoV Nsp3 has been found to bind Nsp2, ORF3a, and ORF9b (von 
Brunn et al., 2007); However, with the slightly larger region 
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Ubl1-HVR-Mac1-2-3-DPUP, these interactions were not confirmed in an Y2H 
assay (Pan et al., 2008). It seems that DPUP might modulate the various 
binding processes. Furthermore, the DPUP subdomain could also regulate the 
sequence specificity of RNA binding by Mac3 as mentioned above (Johnson et 
al., 2010). 

The relative orientation of SARS-CoV Mac2 and Mac3 is fixed by an artificial 
disulfide bond and dimer formation in the crystal (Tan et al., 2009). The NMR 
structure shows that Mac2 and Mac3 as well as Mac3 and DPUP have no 
preferred relative orientations to one another (Johnson et al., 2010). However, 
Mac2, Mac3, and DPUP are surrounded by other domains within Nsp3; it is 
unclear whether these other domains affect the relative orientation among the 
three. More multi-domain structures will be needed to answer this question and 
to elucidate the structural basis of mutual influences of these modules onto 
each other (see, e.g., above for the influence of the HEV helicase on the 
macrodomain of this virus). 

5. Ubiquitin-like domain 2 and papain-like protease 2 

Besides the Macl (X domain), the largest number of crystal structures for any 
Nsp3 domain have been determined for the ubiquitin-like domain 2 (Ubl2) plus 
the papain-like protease 2 (PL2 pro ). So far, structures of this region are 
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available for SARS-CoV, MERS-CoV, IBV, and MHV (Table 1; Ratia et al., 
2006; Lei et al., 2014; Kong et al., 2015; Chen et al., 2015). Ubl2 and PL2 pro 
are conserved in all CoVs (Neuman et al., 2008; Neuman, 2016). The exact 
functional role of the Ubl2 domain is not clear so far, while the PL2 pro was 
reported to possess proteolytic, deubiquitinating, and delSGylating activities 
(Barretto et al., 2005; Lindner et al., 2005; Yang et al., 2014; Mielech et al., 
2014). 

(I) Ubiqutin-like domain 2 (Ubl2) 

The Ubl2 is the second ubiquitin-like subdomain located within Nsp3 (Fig. 2C 
and 6). The structures of Ubl2 in different CoVs are more conserved compared 
to the Ubll. For example, the R.M.S.D. between the Ubl2s of SARS-CoV and 
MHV is 1.2 A (for 58 out of 68 Ca atoms; Z-score: 11.1) according to the Dali 
server (Holm & Rosenstrom, 2010), whereas the corresponding value for the 
Ubll s of the two viruses is 2.8 A (for 85 out of 93 Ca atoms; Z-score: 7.5). 
Some host USPs (with a fold similar to the CoV PL pro ) also include one or more 
Ub-like domain(s), which is/are used to regulate the catalytic activity as well as 
to interact with partners (Komander et al., 2009; Faesen et al., 2012; Pfoh et 
al., 2015). For example, the N-terminal Ubl domain of USP14 is critical for its 
recruitment at the proteasome, thereby enhancing its catalytic activity (Hu et 
al., 2005; Faesen et al., 2012). USP7 (also named “HAUSP”: 
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Herpesvirus-associated USP) includes five Ubl domains (Ubl 1-5), which are 
located at the C-terminus of the protease domain. Ubl4-5 promote Ub binding 
and enhance the DUB activity of USP7 by about 100-fold via interacting with 
the “switching loop” (Trp285-Phe291) in the USP7 catalytic domain (Faesen et 
al., 2012). Ubl2 of USP7 interacts with the HSV-1 immediate-early protein 
ICPO to antagonize the host antiviral response (Pfoh et al., 2015). In contrast 
to the variable relative orientations of the Ubl domains and the catalytic domain 
of USP7, the Ubl2 domain is anchored to the CoV PL2 pro by two salt-bridges in 
MERS-CoV and SARS-CoV (Lei et al., 2014), so it is unlikely to regulate the 
catalytic activity of PL2 pro . In agreement with this conclusion, the presence or 
absence of the Ubl2 of SARS-CoV or MERS-CoV shows almost no effect on 
the PL2 pro activities (Frieman et al; 2009; Clasman et al., 2017). 

Currently, several inconsistent roles of Ubl2 are reported. Frieman et al. 
(2009) demonstrated that the Ubl2 of SARS-CoV is necessary to antagonize 
the host innate immune response via blocking IRF3 or the NF-kB pathway. In 
contrast, Clementz et al. (2010) reported that the Ubl2 of SARS-CoV is not 
necessary for antagonizing IFN production. Also, Mielech et al. (2015) showed 
that the Val787Ser mutation (Nsp3 numbering) in the MHV Ubl2 reduces the 
thermal stability of the PL2 pro , whereas, Clasman et al. (2017) reported that the 
Ubl2 of MERS-CoV does not affect PL2 pro thermal stability. The former Val 
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residue of MHV is conserved in SARS-CoV and MERS-CoV. It is located in the 
first strand ((31) and contributes to the hydrophobic core of Ubl2; therefore, the 
Val-to-Ser change might disrupt the global Ubl2 structure, leading to a 
decrease in the stability of the PL2 pro domain (Mielech et al., 2015). 

On the basis of molecular dynamics simulations, the MERS-CoV Ubl2 has 
recently been proposed to display more molecular flexibility when the PL2 pro 
binds ubiquitin, compared to the situation in the free enzyme. The authors 
speculate that the difference in flexibility of the Ubl2 might regulate the 
interaction with downstream targets, thereby modulating the innate immune 
response (Alfuwaires et al., 2017). Ubiquitination and deubiquitination cannot 
only regulate the immune response but also the cell-cycle, DNA damage repair, 
cellular growth etc. (Welchman et al., 2005), and these processes will involve a 
large number of host proteins. Among these, the coronavirus PL2 pro should 
select its specific targets, such as the host innate-immune system-related 
proteins TRAF3, STING, TBK1, IRF3 etc. (Chen et al., 2014; Lei & Hilgenfeld, 
2017), with the goal of facilitating efficient virus survival. We therefore 
speculate that the Ubl2 might act as a modulator helping the PL2 pro recognize 
its specific targets during coronavirus infection. However, this idea needs to be 
verified by future research. 
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(II) papain-like protease 2 (PL2 pro ) 

The PL2 pro adopts an extended right-hand fold with thumb, palm, and fingers 
subdomains, similar to the TGEV PL1 pro (Fig. 6; Ratia et al., 2006; Lei et al., 
2014; Lee et al., 2015; Kong et al., 2015; Chen et al., 2015; Clasman et al., 
2017) and human USPs (e.g. USP14, USP7; Ratia et al., 2006). A zinc ion is 
coordinated by four cysteines from two p hairpins in the fingers subdomain and 
forms a zinc-finger motif. Although the conformations of the zinc finger are 
variable between different PL2 pro s (Lei et al., 2014; Lee et al., 2015; Kong et al., 
2015; Chen et al., 2015), the motif is essential for structural stability and 
proteolytic activity (Barretto et al., 2005). The catalytic site of PL2 pro comprises 
the typical Cys-His-Asp triad, just like the PL1 pro of TGEV (see above). The 
catalytic Cys is located in the thumb subdomain (at the N terminus of helix 4 of 
SARS-CoV and MERS-CoV PL2 pro ; Ratia et al., 2006; Lei et al., 2014), 
whereas the His as well as the Asp are located in the palm subdomain. In the 
free PL2 pro , the catalytic triad Cys-His-Asp is pre-formed, different from USP7, 
where the catalytic residues are only well aligned upon Ub binding to the 
enzyme (Hu et al., 2002). As we mentioned above, the oxyanion hole of 
papain-like proteases normally comprises a Gin or Asn side-chain 5 or 6 
residues N-terminal to the catalytic Cys. This situation is found in the MHV 
PL 2 pro (Chen et al., 2015), but the corresponding residues are Trp, Leu, and 
Trp in the enzymes of SARS-CoV, MERS-CoV, and IBV, respectively (Ratia et 
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al., 2006; Lei et al., 2014; Kong et al., 2015). Nevertheless, the indole-ring 
nitrogen of Trp can form a hydrogen bond with the oxyanion intermediate of 
substrate hydrolysis. The protease activity of the SARS-CoV PL2 pro is 
abolished upon a Trp-to-Ala mutation (Ratia et al., 2006). In contrast, the Leu 
of MERS-CoV PL2 pro totally lacks the ability to contribute to oxyanion 
stabilization via a hydrogen bond (Lei et al., 2014). The deficient oxyanion hole 
of MERS-CoV PL2 pro causes an about 100-fold lower proteolytic activity 
compared to that of the SARS-CoV pL2 pro when using 
Arg-Leu-Arg-Gly-Gly-7-amino-4-methylcoumarin (RLRGG-AMC) as a 
substrate (Baez-Santos et al., 2014). Meanwhile, the corresponding activity of 
the Leu-to-Trp mutation in MERS-CoV PL2 pro is about 50-fold higher than that 
of the wild-type enzyme, using the same substrate (Lei et al., 2014). As we 
mentioned before (Lei & Hilgenfeld, 2016), the efficiency of viral proteases 
does not always have to be optimized during virus evolution. Rather, the 
creation of temporary intermediates of polyprotein cleavage, in the right 
temporal order, is necessary for correct virus replication (Kanjanahaluethai & 
Baker, 2000; Gosert et al., 2002; Harcourt et al., 2004); thus, the proper but 
not necessarily the highest protease activity is beneficial for virus survival. 

In order to investigate the mechanism of the DUB and delSGylating activities 
of CoV PL pro s, the complex of the enzyme with ubiquitin (or ISG15) is 
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important. Until now, structures of SARS-CoV and MERS-CoV PL2 pro with 
mono-Ub as well as of SARS-CoV PL2 pro with di-Ub have been obtained 
(Chou et al., 2014; Ratia et al., 2014; Bekes et al., 2016; Bailey-Elkin et al., 
2014; Lei & Hilgenfeld, 2016). Very recently, the structure of SARS-CoV PL2 pro 
in complex with the C-terminal Ubl domain of hlSG15 or mlSG15 has also 
been reported (Daczkowski et al., 2017). These structures show that the 
PL 2 pro of SARS-CoV possesses two ubiquitin-binding sites (named Ubl and 
Ub2 sites here; Ratia et al., 2014; Bekes et al., 2016). From the prior structure 
of USP14 in complex with ubiquitin, it is known that two blocking loops (BL1 
and BL2) regulate substrate binding (Hu et al., 2005). Different from that, only 
the BL2 exists in CoV PL2 pro s and is involved in substrate binding (Fig. 6; Chou 
et al., 2014; Ratia et al., 2014; Bailey-Elkin et al., 2014; Lei & Hilgenfeld, 2016), 
whereas BL1 is absent in CoV PL2 pro s (Ratia et al., 2006; Lei et al., 2014). 

The proximal Ub binding site (Ubl) is, to a certain degree, conserved between 
the PL2 pro s of SARS-CoV and MERS-CoV. The region includes the narrow 
substrate channel between the thumb and the palm subdomains, as well as a 
hydrophobic patch in the fingers subdomain (Fig. 6). The narrow substrate 
channel binds the C-terminal RLRGG residues of ubiquitin (Chou et al., 2014; 
Ratia et al., 2014; Bailey-Elkin et al., 2014; Lei & Hilgenfeld, 2016; in order to 
be clear, Ub residues appear in italics here). The C-terminal RLRGG of 
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ubiquitin is similar to the unprimed side of the polyprotein substrates, 
(R/K)(L/I)XGG in the two viruses. The SI, S2, and S4 pockets are well 
conserved to accommodate the two small glycines (PI, P2) and the 
hydrophobic P4 residue (Leu or lie). In contrast, the flexible side-chains in P3 
and P5 feature binding patterns that are slightly different between SARS-CoV 
and MERS-CoV PL2 pro . In the SARS-CoV PL2 pro (Cys112Ser)-Ub complex, 
P3-Arg forms a weak salt-bridge with Glu162 (Chou et al., 2014), whereas the 
corresponding P3-Arg is exposed to solvent in the MERS-CoV complex (Lei & 
Hilgenfeld, 2016). On the other hand, the P5 -Arg is exposed to solvent in the 
SARS-CoV complex (Chou et al., 2014) but forms a strong salt-bridge with 
Asp164 in MERS-CoV (Bailey-Elkin et al., 2014; Lei & Hilgenfeld, 2016). 
Interestingly, this Asp164 is unique among CoV PL2 pro s, and the Asp164Ala 
replacement leads to an about 4.5-fold and 3.5-fold reduction of the proteolytic 
and DUB activities, respectively (Lei & Hilgenfeld, 2016). As just mentioned, 
the proteolytic activity of the MERS-CoV PL2 pro is not optimized due to the 
deficient oxyanion hole. On the other hand, the virus requires a strong DUB 
activity to counteract the host immune response. The suboptimal enzyme 
activities may be partly compensated by the unique Asp164 (Lei & Hilgenfeld, 
2016). 
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In addition to the binding of the Ub C-terminus to the substrate channel, there 
is an interaction between a hydrophobic region of the SARS-CoV and 
MERS-CoV PL2 pro s in the fingers subdomain and a hydrophobic patch ( Ile44, 
Ala46, Gly47) of Ub (Chou et al., 2014; Ratia et al., 2014; Bailey-Elkin et al., 
2014; Lei & Hilgenfeld, 2016; Bekes et al., 2016). This hydrophobic patch of 
Ub is commonly used to interact with Ub-binding proteins (Dikic et al., 2009). 
The fingers subdomain residues involved are Tyr208 and Met209 in 
SARS-CoV, and Tyr209 and Val210 in MERS-CoV (Chou et al., 2014; Ratia et 
al., 2014; Bailey-Elkin et al., 2014; Lei & Hilgenfeld, 2016; Bekes et al., 2016). 
Moreover, these hydrophobic interactions between the PL2 pro and Ub are 
important for the DUB activity of the enzyme, because disrupting them via a 
Val210Arg mutation dramatically diminishes the DUB activity in MERS-CoV 
PL2 pro (Bailey-Elkin et al., 2014). 

Near the hydrophobic patch of Ub, Arg42 forms a salt-bridge with Glu168 of 
PL2 pro in two structures of the SARS-CoV PL2 pro in complex with 
mono-ubiquitin or Lys48-linked di-Ub (Chou et al., 2014; Ratia et al., 2014; 
Bekes et al., 2016). However, this Glu is replaced by Arg in MERS-CoV PL2 pro , 
resulting in Arg42 instead forming a salt-bridge with Asp165 in the MERS-CoV 
PL2 pro —ubiquitin complex (Lei & Hilgenfeld, 2016). This illustrates that various 
fine-tuned binding patterns exist between Ub and PL2 pro s in different CoVs. 
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Besides the Ubl binding site, the Ub2 binding site is mapped by the complex 
of SARS-CoV PL2 pro with Lys48-linked di-Ub (Fig. 6; Bekes et al., 2016). The 
Ub2 binding site is located at the first a helix of the thumb subdomain. Phe70 
interacts with the common hydrophobic patch ( Ile44 , Ala46, Gly47) of Ub. 
Interestingly, MERS-CoV PL2 pro seems to lack the corresponding Ub2 binding 
site. Phe70 of SARS-CoV PL2 pro is changed to Lys69 in MERS-CoV (Bekes et 
al., 2016). In addition, Bekes et al. (2016) predicted that Trp107 and Alai 08 
could constitute the Ubl' binding site in SARS-CoV PL2 pro . The 
Trp107Leu/Ala108Ser double mutation reduces the enzyme's activity towards 
Lys48-linked tri-Ub-AMC by about 75% (Bekes et al., 2016). However, it 
should be noted that Trp107 contributes to the oxyanion hole of SARS-CoV 
PL 2 pro (see above); therefore, the reduced DUB activity upon replacing Trp107 
by Leu is perhaps not due to altering the Ubl' binding site, but rather to 
destroying the oxyanion hole. 

The SARS-CoV PL2 pro displays more efficient cleavage activity towards 
Lys48-linked di-Ub-AMC than Lys63-linked di-Ub-AMC substrates in vitro, 
demonstrating that the PL2 pro preferentially recognizes Lys48-linked polyUb 
chains (Baez-Santos et al., 2014; Bekes et al., 2015, 2016). In contrast, 
MERS-CoV PL2 pro processes Lys48- and Lys63-linked polyUb chains with 
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similar efficiency (Baez-Santos et al., 2014). Lys48-linked Ub chains mainly 
cause target protein degradation via the 26S proteasome, while Lys63-linked 
polyllb is mainly related to DNA repair and signal transduction (Ikeda & Dikic, 
2008), in particular, in the signal transduction cascades of the host innate 
immune system (Dikic & Dotsch, 2009). However, the biological significance of 
the CoV PL pro s showing different cleavage activities on Lys48- and 
Lys63-linked polyUb is still unclear. Furthermore, the SARS-CoV PL2 pro 
cleaves the polyUb chain by removing di-Ubs, not mono-Ub units as in 
MERS-CoV (Bekes et al., 2015). This strongly suggests that MERS-CoV 
PL2 pro possesses the Ubl and Ubl' binding sites but not a Ub2 site, consistent 
with the Phe70 to Lys mutation in MERS-CoV PL2 pro as just mentioned. 

At the same time, ISG15 utilizes a different Ub2 binding site of SARS-CoV 
PL2 pro , compared to Lys48-linked di-Ub (Bekes et al., 2016), but no structure 
for a full-length ISG15-CoV PL2 pro complex is available so far. Daczkowski et 
al. (2017) reported that the C-terminal domains of ISG15s (similar to Ubl 
mentioned above) from different species have different binding characteristics 
with SARS-CoV PL2 pro according to two structures, the PL2 pro in complex with 
the C-terminal domain of hlSG15 and mlSG15, respectively. In addition, the 
structure of mouse USP18 in complex with full-length mlSG15 became 
available this year (Basters et al., 2017). Surprisingly, the N-terminal Ubl 
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domain of mlSG15 shows almost no interaction with mUSP18. Does ISG15 
behave similarly when binding to the CoV PL pro ? How does the N-terminal 
domain of ISG15 of different species recognize the cognate CoV PL pro ? It 
would be of interest to determine not only the structure of a full-length 
hlSG15-HCoV PL pro complex but also that of mlSG15 with MHV PL pro . 

The DUB and delSGylating activities of CoV PL pro s are well established, but 
the detailed mechanism of the PL pro antagonism of the host innate immune 
response is still ambiguous (see Lei & Hilgenfeld, 2017, for a recent review). 
Various cytokines (including interferons (IFNs) and tumor necrosis factors 
(TNFs)) are produced to inhibit virus replication by two main pathways, the 
IRF3 pathway and the NF-kB pathway (Seth et al., 2006; Hiscott et al., 2006). 
For more information on the host innate immune system signaling pathways, 
the reader should consult other reviews (e.g., Mogensen, 2009; Lei & 
Hilgenfeld, 2017). Devaraj et al. (2007) found that the SARS-CoV PL2 pro can 
directly bind IRF3 to block its phosphorylation, dimerization, and nuclear 
translocation, thereby inhibiting IFN-(B induction. Furthermore, the PL2 pro was 
found not to block the NF-kB signaling pathway and the protease activity was 
described as dispensable for antagonizing the IFN response (Devaraj et al., 
2007). Clementz et al. (2010) also confirmed that the enzyme activity of 
HCoV-NL63 PL2 pro is not essential for counteracting the antiviral IFN 
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production. In contrast, Frieman et al. (2009) reported that the SARS-CoV 
PL2 pro does not directly bind IRF3 or disrupt its phosphorylation. Instead, the 
PL2 pro was proposed to inhibit the NF-kB signaling pathway by stabilizing its 
inhibitor, kBct (Frieman et al., 2009). Furthermore, the protease activity of 
SARS-CoV PL2 pro is important for blocking the TNF-o/NF-kB signaling 
pathway (Frieman et al., 2009). In addition, the HCoV-NL63 but not the MHV 
PL2 pro has the ability to impede the IRF3 and NF-kB pathways, indicating that 
the functions of the PL2 pro are specific for different CoVs (Frieman et al., 2009). 
Later, a protein comprising the SARS-CoV PL2 pro and the TM (transmembrane 
region of Nsp3) was demonstrated to inhibit the STING/TBKI/IKKe-mediated 
signaling pathway (upstream regulators of IRF3; Chen et al., 2014), thereby 
disrupting IRF3 phosphorylation and dimerization, and blocking the type-1 IFN 
response. SARS-CoV PL2 pro plus TM can also physically interact with the 
STING-TRAF3-TBK1 complex and remove the ubiquitins from ubiquitinated 
RIG-1, STING, TRAF3, TBK1, as well as IRF3 (Chen et al., 2014). In 2016, it 
was reported that the SARS-CoV PL2 pro can inhibit the Toll-like receptor 7 
(TLR7)-mediated type-1 IFN response and the NF-kB pathway by removing 
the Lys63-linked polyUb chain from TRAF3 and TRAF6 (upstream regulators 
of IRF3 and NF-kB; Li et al., 2016b). Interestingly, the SARS-CoV PL2 pro only 
removes the Lys63- but not the Lys48-linked polyUb chain from TRAF3 and 
TRAF6 in vivo (Li et al., 2016b). On the other hand, Baez-Santos et al. (2014) 


44 




ACCEPTED MANUSCRIPT 


and Bekes et al. (2015, 2016) have shown that SARS-CoV PL2 pro prefers to 
digest Lys48- over Lys63-linked polyllb chains in vitro (see above). Why does 
the substrate specificity of PL2 pro seem to be different in vivo and in vitro ? 
Does any other factor influence the substrate specificity of PL2 pro in vivo when 
counteracting the cellular innate immune response? These questions are yet 
to be answered. 

In addition, the HCoV-NL63 PL2 pro was shown to block the p53-IRF7-IFNp 
signaling pathway (Yuan et al., 2015). p53 can induce type-1 interferon 
production via IRF7 (interferon regulatory factor 7; Yuan et al., 2015). 
Meanwhile, p53 can be degraded via the MDM2- (an E3 ubiquitin ligase) 
mediated ubiquitin-proteasome system (Flaupt et al., 1997). Yuan et al. (2015) 
found that the HCoV-NL63 PL2 pro deubiquitinates and stabilizes MDM2 to 
augment p53 degradation, thereby antagonizing the host innate immune 
response. Recently, the PL2 pro of SARS-CoV and MERS-CoV as well as the 
PL1 pro /PL2 pro of HCoV-NL63 were shown to directly interact with the host E3 
ubiquitin ligase RCHY1 (also called Pirh2; Ma-Lauer et al., 2016), thereby 
increasing the stability of the latter. Like MDM2, RCHY1 can induce p53 
degradation as well (Leng et al., 2003). Ma-Lauer et al. (2016) found that p53 
inhibits the replication of SARS-CoV. Stabilization of RCHY1 by physical 
interaction with the PL2 pro increases the degradation of p53 and supports 
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coronavirus replication (Ma-Lauer et al., 2016). While the HCoV-NL63 PL2 pro 
stabilizes MDM2 by debiquitinating it (Yuan et al., 2015), the SARS-CoV 
PL2 pro surprisingly does not deubiquitinate RCHY1 (Ma-Lauer et al., 2016). 
How does the PL2 pro stabilize RCHY1? The mechanism has yet to be 
elucidated. 

Besides the functions of PL2 pro discussed above, the enzyme was shown to 
interact with other viral proteins. The region from PL2 pro to the C-terminus of 
Nsp3 in SARS-CoV can interact with the Nsp2, ORF3a, and ORF9b proteins, 
as identified by Y2H and ColP assays (von Brunn et al., 2007). Through similar 
assays, the region PL2 pro -NAB-pSM was found to interact with Nsp4 as well 
as Nsp12 (Pan et al., 2008). The SARS-CoV PL2 pro was further shown to bind 
ORF7a and Nsp6 by using proteomics analysis (Neuman et al., 2008). 

Coronavirus PL pro is an important target for developing antiviral drugs. This 
aspect has been well reviewed by Baez-Santos et al. (2015) within this series; 
hence, we mention only inhibitors here that have been described since. Two 
big challenges exist when designing PL pro inhibitors: 1) the SI and S2 binding 
pockets are tailor-made to accommodate glycine residues and hence they are 
small; therefore, identifying suitable peptidomimetic chemical structures is 
difficult; 2) many host USPs feature folds and active sites similar to the PL pro s, 
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so specificity of the inhibitors could be an issue. However, there is a good 
chance that the BL2 loop (mentioned above) of CoV PL2 pro s could provide 
sufficient uniqueness to solve the specificity problem. This loop is involved in 
substrate binding and is different not only between USPs and CoV PL pro s but 
also among different CoVs (Hu et al., 2005; Ratia et al., 2006; Lei et al., 2014; 
Baez-Santos et al., 2014, 2015; Lee et al., 2015). For example, this loop 
comprises 6 amino-acid residues (GNYQCG) in SARS-CoV PL2 pro but 7 
(GIETAVG) in the enzyme of MERS-CoV, leading to the inability of SARS-CoV 
inhibitors to act on MERS-CoV PL pro (Baez-Santos et al., 2014; Hilgenfeld, 
2014; Lee et al., 2015). Using a high-throughput assay, the purine derivative 
8-(trifluoromethyl)-9H-purin-6-amine (compound 4; Fig. 7A) was identified as a 
competitive MERS-CoV PL2 pro inhibitor, with an IC 50 of about 6 pM in vitro (Lee 
et al., 2015). Interestingly, this compound is also (moderately) active against 
SARS-CoV PL2 pro (IC 50 ^11 pM) but acts as an allosteric inhibitor in this case 
(Lee et al., 2015). Furthermore, the authors also reported that this inhibitor 
shows very high selectivity against human ubiquitin C-terminal hydrolase 
(hUCH-LI; IC 50 > 100 pM), which is one of the host proteins most closely 
related to the CoV PL pro (Lee et al., 2015). In contrast, Clasman et al. (2017) 
reported that compound 4 features no selective inhibition of CoV PL pro s nor 
host USPs; therefore, this compound could be a pan-assay interference 
inhibitor (or PAIN). Recently, nine alkylated chalcones (1-9) and four 
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coumarins (10-13), which were isolated from the perennial plant Angelica 
keiskei, had their inhibitory activities against both the SARS-CoV M pro (3CL pro , 
chymotrypsin-like protease) and the PL2 pro tested (Park et al., 2016). One of 
the chalcones, compound 6 (Fig. 7B), exhibited relatively strong inhibition of 
both the 3CL pro and the PL2 pro in vitro, with IC 50 values of 11.4 and 1.2 pM 
respectively (Park et al., 2016). Chalcone 6 uses different inhibition 
mechanisms for 3CL pro and PL2 pro . It is a competitive inhibitor for the former 
enzyme but a non-competitive one for the latter (Park et al., 2016). Clearly, the 
large body of structural information available for the CoV PL pro s and host DUBs 
should enable more design of inhibitors specific for the viral enzyme. 

6. Nucleic Acid-Binding (NAB) domain and betacoronavirus-specific 
marker ((3SM) domain 

The nucleic-acid binding (NAB) and betacoronavirus-specific marker (PSM) 
domains together are also named “Nsp3e” (Neuman et al., 2008). The latter 
domain alone was previously called “group 2-specific marker” (G2M) (Neuman 
et al., 2008). The NAB and (3SM domain exist in the genus Betacoronavirus. 
The corresponding region is absent in alphacoronaviruses and 
deltacoronaviruses (Neuman, 2016). In gammacoronaviruses, there is a 
gammacoronavirus-specific marker (ySM) domain at this position (Neuman, 
2016). 
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Structural information on this region is very limited for all coronaviruses. Thus 
far, only an NMR structure of the NAB domain of SARS-CoV is available 
(Table 1; Fig. 8; Serrano et al., 2009). The structure comprises two antiparallel 

P sheets ((31 -*-(36; P2+P8) and one parallel p sheet (P3-P4-P5-P7) as well as 
two a helices and two 3i 0 helices (r)1 and p2) in the order 

P1-p2-p3-a1-p4-p5-n1-n2-p6-p7-a2-p8. Four p strands (P3-p4-p5-p7) 
and two helices (al, a2) form a “half-barrel”. The structure of the NAB 
represents a unique fold (Serrano et al., 2009). The domain has been shown to 
bind ssRNA as well as to unwind dsDNA (Neuman et al., 2008). When binding 
to ssRNA, the NAB prefers sequences with repeats of three consecutive Gs 
(Serrano et al., 2009), such as (GGGA) 5 and (GGGA) 2 . A positively charged 
surface patch (Lys75, Lys76, Lys99, and Arg106) is involved in RNA binding 
(Fig. 8). These residues are located in the loop between p2 and P6 as well as 
in helix a2 (Serrano et al., 2009). The RNA binding behavior of the NAB 
appears to be similar to that of SARS-CoV Mac3 (SUD-M), which has a 
specificity for oligo(G) (Tan et al., 2007, 2009), although the latter is also 
reported to bind oligo(A) (Chatterjee et al., 2009; Johnson et al., 2010, 
mentioned above). Whether there is a functional relation between Mac3 and 
NAB, remains to be investigated. 
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Currently, no structural information is available concerning the (3SM or ySM, 
and nothing is known about the function of these modules either. A gene 
encoding the (3SM domain of SARS-CoV could not be expressed in E. coli ; this 
module has been predicted to be a nonenzymatic domain (Neuman et al., 
2008). In the absence of sequence similarity to any domain of known function, 
we performed an ab-initio protein structure prediction using the sequence of 
the SARS-CoV (3SM domain and the QUARK online server (Xu & Zhang, 
2012). The result indicates that most of this region is intrinsically disordered. 
This does not preclude that it might adopt a defined structure upon interaction 
with another Nsp or RNA, or a host protein. 

7. Transmembrane regions (TM1 and TM2), Nsp3 ectodomain, Y1 domain, 
and CoV-Y domain 

This part of Nsp3 includes two transmembrane regions as well as three soluble 
domains, which together constitute about one third of the multidomain protein. 
The two transmembrane regions are TM1 and TM2, while the three domains 
are the Nsp3 ectodomain (3Ecto), Y1, and CoV-Y. The sequential order of this 
part is TM1-3Ecto-TM2-Y1-CoV-Y (Fig. 1A and B). Even though this part 
exists in all coronaviruses (Neuman et al., 2008; Neuman, 2016), thus far, no 
three-dimensional structure is available for the entire region nor for a part of it. 
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Nsp3 of CoVs is thought to pass the ER membrane twice, since there are two 
predicted transmembrane regions, TM1 and TM2 (Harcourt et al., 2004; 
Kanjanahaluethai et al., 2007; Oostra et al., 2008). According to the 
transmembrane region prediction server TMHMM (Krogh et al., 2001), there is 
a total of three hydrophobic regions in SARS-CoV Nsp3 (Table 1; Fig. IB). 
Oostra et al. (2008) proposed that the first two of the three hydrophobic 
regions span the membrane while the last one (AH1), which has more 
amphipathic character, does not (Fig. IB). Thus, the 3Ecto would be the only 
domain located on the lumenal side of the ER in SARS-CoV Nsp3 (Fig. IB). 
The 3Ecto is thought to bind metal ions and has also been designated as a 
zinc-finger (ZF) domain before (Neuman et al., 2008). Neuman, (2016) found 
that the metal binding Cys-His cluster is not conserved in all CoVs and has 
renamed this domain into “3Ecto”. In fact, only two cysteine residues are 
conserved in the CoV 3Ecto domain (Fig. 9A), hence this domain is unlikely to 
be a zinc-finger domain. The transmembrane regions plus the 3Ecto are 
important for the PL2 pro to process the Nsp3i4 cleavage site in SARS-CoV 
and MHV (Harcourt et al., 2004; Kanjanahaluethai et al., 2007); a possible 
reason is that the transmembrane part could bring the PL2 pro close to the 
cleavage site between the membrane-associated proteins Nsp3 and Nsp4. 
Asparagine (N)-linked glycosylation has been found in the 3Ecto domains of 
SARS-CoV and MHV (Fig. IB and 9; Harcourt et al., 2004; Kanjanahaluethai 
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et al., 2007). It is unclear if the N-glycan modification affects the 3Ecto 
conformation or stability. Frequently, N-linked glycans serve as recognition 
points for partner molecules (Aebi, 2013). It has been shown that interaction of 
the 3Ecto with the lumenal loop of Nsp4 is essential for the ER rearrangements 
occurring in cells infected by SARS-CoV or MHV (the 3Ecto is named “lumenal 
loop of Nsp3" in this paper; Hagemeijer et al., 2014). 

The Y1 and CoV-Y domains are located at the cytosolic side of the ER. The Y1 
domain is conserved in all viruses of the order Nidovirales, while CoV-Y is only 
conserved in all coronaviruses (Neuman, 2016). Since no three-dimensional 
structure is available for this part, the domain assignment of Y1 and CoV-Y is 
ambiguous (Neuman, 2016). We found that the sequence identity of 
YI+CoV-Y between different CoV genera is above 25% and two Cys-His 
clusters are present in the N-terminal part of the Y1 domain, possibly binding 
zinc ions (Fig. 9). However, it is still unclear if the fold and function in this 
region are conserved. Currently, functional information on this part is limited. It 
has been shown that the C-terminal third of Nsp3 (|3SM (partial) 
-TM1-3Ecto-TM2-AH1-Y1+CoV-Y) of Nsp3 binds less efficiently to Nsp4 
without the Y1 and CoV-Y domains (Hagemeijer et al., 2014), although these 
two domains are not as important for this process as the 3Ecto. 


52 




ACCEPTED MANUSCRIPT 


According to a Y2H screen, ColP, as well as GST pull-down assays, different 
constructs of Nsp3 with different C-terminal regions were identified to interact 
with various viral non-structural proteins of SARS-CoV (von Brunn et al., 2007; 
Imbert et al., 2008; Pan et al., 2008). For example, a construct comprising the 
domains from PL2 pro to the end of Nsp3 can bind Nsp2, ORF3a, and ORF9b 
(see above; von Brunn et al., 2007); the NAB-(3SM-TM1 of Nsp3 can interact 
with Nsp5, Nsp7 - 8, as well as Nsps 12-16, and Y1 plus CoV-Y interacts 
with Nsp9 and Nsp12 (Imbert et al., 2008); in addition, the NAB-(3SM-TM1 of 
Nsp3 can also interact with other domains within Nsp3, except for Macl (X 
domain) (Imbert et al., 2008); a PL2 pro -NAB-(3SM-TM1 construct of Nsp3 can 
bind Nsp4 and Nsp12, while the region from TM1 to the end of Nsp3 only binds 
Nsp8 (Pan et al., 2008). It has been found that the interaction between the 
C-terminal region of Nsp3 and Nsp4 is essential for the formation of CMs and 
DMVs derived from the ER in CoV-infected cells (Angelini et al., 2013; 
Hagemeijer et al., 2014). The viral RNA and replicase proteins (Nsps) need to 
be associated with these modified membranes to form the replicative 
organelles (see Neuman, 2016, for review). In addition, these membranes can 
protect the viral RNA and Nsps against nucleases and proteases in vitro (van 
Hemert et al., 2008). Besides the Nsp3-Nsp4 interaction, it is still unclear 
whether all other interactions really exist or how these interactions affect the 
viral life cycle. At least, it seems that the membrane-associated region of Nsp3 
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may regulate the interactions with other viral proteins. It is definitely necessary 
to put more effort into the structural and functional characterization of this 
region. 

Conclusions 

Overall, the multi-domain Nsp3 plays various roles in coronavirus infection. It 
releases Nspl, Nsp2, and itself from the polyproteins and interacts with other 
viral Nsps as well as RNA to form the replication/transcription complex. It acts 
on posttranslational modifications of host proteins to antagonize the host 
innate immune response (by de-MARylation, de-PARylation (possibly), 
deubiquitination, or delSGylation). Meanwhile, Nsp3 itself is modified in host 
cells, namely by N-glycosylation of the 3Ecto domain. Furthermore, Nsp3 can 
interact with host proteins (such as RCHY1) to support virus survival. 

As the largest non-structural protein of CoVs, Nsp3 has also been identified as 
the major selective target for driving evolution in lineage C betaCoVs on the 
basis of a high rate of positively selected mutation sites (Forni et al., 2016). 
Furthermore, the adaptive evolution of Nsp3 of MERS-CoV is still ongoing 
(Forni et al., 2016). For example, the Arg91 ICys mutation (located in the palm 
subdomain of the PL2 pro , corresponding to Arg283 in Lei et al., 2014) of Nsp3 
exists in the viral strain KOR/KNIH responsible for the 2015 South Korean 
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outbreak but not in the ancestral strain EMC/2012 (Forni et al., 2016). It is 
interesting to speculate why coronaviruses keep many essential functions in 
one protein, while this protein shows high-rate genetic diversity during CoV 
evolution. In the end, increased research efforts into the structure and function 
of Nsp3 are needed to achieve a more complete understanding of this protein. 
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Legends to figures 

Fig. 1. Genome organization of coronaviruses; Nsp3 domains and their 
functions. (A) The 5'-terminal two thirds of the CoV genome comprise ORFIa 
and ORFIb. ORFIa encodes the polyprotein la (Nspl-11) while ORFIa plus 
ORFIb produce the polyprotein lab (Nspl-16) through a ribosomal frameshift 
overreading the stop codon of ORFIa (indicated by a black arrow). The 
3'-proximal third encodes the structural proteins S, E, M, and N as well as 
accessory proteins. The polyproteins ppl a and ppl ab are processed by the 
viral proteases PL1 pro , PL2 pro (both domains of Nsp3), and M pro (3CL pro , Nsp5). 
The domain organization of Nsp3 is different in different CoV genera. The 
Nsp3 of HCoV NL63 as a representative of alpha-CoVs, and of SARS-CoV in 
clade B of the genus beta-CoM , are zoomed out. The question mark within 
HCoV-NL63 Nsp3 indicates a region of unknown function and structure. (B) 
Summary of the functions and domain organization of SARS-CoV Nsp3. Nsp3 
is bound to double-membrane vesicles recruited from the endoplasmic 
reticulum (ER) membrane. The protein passes through this membrane twice, 
via the two transmembrane regions TM1 and TM2. AH1 is possibly an 
amphipathic helix attached to the ER membrane, next to TM2. Except for the 
3Ecto domain, all other Nsp3 domains are located in the cytosol. All domains 
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with known three-dimensional structures are indicated in light green (X-ray 
structures) or orange (NMR structures), whereas parts with unknown structure 
are in red. The best characterized functions of each domain of Nsp3 are shown. 

glycosylation sites in the 3Ecto domain (Asn1431 and Asn1434; Harcourt 
et al., 2004). 

Fig. 2. Structures (in cartoon view) of the ubiquitin-like domain 1 (Ubll) and 
Ubl2 in SARS-CoV, Ubll in MHV, as well as their structural homologues. (A) 
Ubll (residues 20-108) of SARS-CoV (PDB entry: 2IDY : Serrano et al., 2007). 
(B) Ubll (19-114) of MHV (PDB entry: 2M0A : Keane and Giedroc, 2013). (C) 
Ubl2 (residues 1-60) of SARS-CoV (PDB entry: 2FE8 : Ratia et al., 2006). (D) 
human ubiquitin (PDB entry: 1UBQ : Vijay-Kumar et al., 1987). (E) human 
interferon-stimulated gene 15 (hlSG15; PDB entry: 1Z2M : Narasimhan et al., 
2005). hlSG15 contains two linked ubiquitin-like domains; here, the N-terminal 
Ubl domain is shown. (F) the Ras-interacting domain of RaIGDS (PDB entry: 
1LFD : Huang et al., 1998). The N and C termini of all structures are marked. 
All a and 3i 0 (n) helices are labeled and shown in cyan. (3 strands are in purple 
and loops are in brown. This figure and Fig. 3, 5, as well as 8 were generated 
by using Chimera (Pettersen et al., 2004). 
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Fig. 3. Crystal structure of the papain-like protease domain 1 (PL1 pro ) of TGEV. 
Cartoon view of the overall structure (PDB entry: 3MP2 : Wojdyla et al., 2010). 
The thumb, fingers, and palm subdomains are shown in blue, brown, and 
green, respectively. The Ca atoms of the catalytic triad residues 
(Cys32-His183-Asp196) are displayed as yellow, blue, and red spheres. 
Residue Gln27 contributing to the oxyanion hole is shown in ball & stick style. 
Ilel 55, Thr209, and Tyr175 forming the S4 pocket are labeled; Mel 55 is in 
black and the latter two are in red. The N and C termini of the PL1 pro are 
indicated. 

Fig. 4. Structure of the MERS-CoV macrodomain I (Macl, X domain) in 
complex with ADP-ribose (ADPr) (PDB entry: 5HOL). The protein features an 
a/(3/a sandwich fold. The central p sheet with the strand order 
P1-P2-P7-P6-P3~P5-P4 is shown in purple, pi and P4 are labeled. An F 0 -F c 
omit difference map of ADPr is shown in black (contoured at 4.0 a). The ADPr 
itself is displayed as brown sticks. The five regions (blue) relating to ADPr 
binding are marked by Roman numbers I - V. Fixing the two ends of the ADPr, 
Asp21 and Asn39 are displayed by thicker red sticks. The 02' of ADPr forms a 
hydrogen bond with a water molecule (H 2 0 308; green sphere) being 
stabilized by the side-chain of Asn155. The “GGG” triple-glycine motif is 
displayed in black. H 2 0 310 (green sphere) corresponds to a water molecule 
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that has been proposed to mediate a nucleophilic attack onto the Cl" atom of 
the ADPr in the de-MARylation reaction catalyzed by the VEEV X domain (Li et 
al., 2016a). The N and C termini of the X domain are marked. This figure and 
Fig. 6 were prepared using Pymol (Schrodinger; http://www.pymol.org/). 

Fig. 5. Structures (in cartoon style) of the macrodomains II (Mac2) and III 
(Mac3), of the Domain Preceding Ubl2 and PL2 pro (DPUP) of SARS-CoV and 
MHV, as well as of the frataxin-like fold protein Yfhl. (A) and (B) Mac2 and 
Mac3 (PDB entry: 2W2G : Tan et al., 2009). Both domains possess the a/(B/a 
sandwich fold. The central six (3 strands in the order |31 —(S6—135—132—(34—(33 are 
displayed in purple. A predominantly positively charged surface patch 
(Lys563+Lys565+Lys568+Glu571; Nsp3 numbering) of Mac3 being involved in 
binding oligo(G) (Kusov et al., 2015) is labeled. (C) The SARS-CoV DPUP 
NMR structure (PDB entry: 2KQW : Johnson et al., 2010). (D) The MHV DPUP 
X-ray crystal structure (PDB entry: 4YPT : Chen et al., 2015). (E) Structure of 
the yeast frataxin-like protein Yfhl, as determined by NMR spectroscopy (PDB 
entry: 2GA5 : He et al., 2004). All structures shown in (C), (D), and (E) display 
the typical frataxin-like fold. Two a helices located at the N- and C- terminal of 
each structure form one plane and the (3 sheet forms the other plane. The 
negatively charged residues (Asp or Glu) in the first a helix (al) are shown in 
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red (in (C), (D), and (E)); they are possibly involved in binding metal ions. The 
N and C termini of all structures are marked. 

Fig. 6. Structure of the SARS-CoV papain-like protease 2 (PL2 pro ) in complex 
with Lys48-linked diubiquitin (PDB entry: 5E6J : Bekes et al., 2016). The Ubl2 
is shown as a grey cartoon. The catalytic domain (PL2 pro ) is displayed in 
surface view. The thumb, fingers, and palm subdomains are shown in blue, 
light brown, and green, respectively. The blocking loop 2 (BL2) is depicted in 
red. The Lys48-linked diubiquitin is displayed as a light-blue cartoon. Lys48 of 
Ubl is linked to the C-terminal Gly75 of Ub2 (black sticks) via a triazole (red 
sticks). The N and C termini of Ubl ( N1, Cl) as well as the N terminus of Ub2 
(A/2) are marked. The conserved hydrophobic patches ( Ile44 , Ala46, Gly47) of 
Ubl and Ub2 are indicated by purple and orange dots, respectively. The 
residue Phe70 (yellow) interacting with the hydrophobic patch of Ub2 is 
labeled. The C-terminal Arg-Leu-Arg-Gly-Gly residues (RLRGG) of Ubl are 
shown in ball & stick style (purple). P3-Argand P5-Argare marked. 

Fig. 7. Recently described inhibitors of the CoV PL2 pro . (A) Structural formula 
of the purine derivative 8-(trifluoromethyl)-9H-purin-6-amine (compound 4). 
This compound is a competitive MERS-CoV PL2 pro inhibitor (Lee et al., 2015). 
It is also active against SARS-CoV PL2 pro but acts as an allosteric inhibitor in 
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this case. (B) A natural-product chalcone, compound 6 from the perennial plant 
Angelica keiskei, inhibits the SARS-CoV M pro (3CL pro ) and PL2 pro /n vitro (Park 
et al., 2016). 

Fig. 8. NMR structure of the nucleic acid-binding (NAB) domain in SARS-CoV. 
(cartoon style; PDB entry: 2K87 : Serrano et al., 2009). The order of 
secondary-structure elements is (31 —(32—(33—a 1 —(34—135—n 1 —r|2—(36—(37—a2—138. 
The overall structure of NAB represents a unique fold. The residues involved in 
RNA binding (Lys75, Lys76, Lys99, and Arg106) are displayed in blue. The N 
and C termini of the NAB domain are labeled. 

Fig. 9. Multiple sequence alignment of the 3Ecto and the N-terminal portion of 
the YI+CoV-Y domains. The conserved cysteines in 3Ecto as well as 
cysteines and histidines in the N-terminal portion of Y1 are marked by triangles. 
Two glycosylation sites in the 3Ecto domain of SARS-CoV (Asn1431 and 
Asn1434; Harcourt et al., 2004) are indicated by asterisks. The corresponding 
sequence accession numbers are: SARS-CoV, Genbank: AY274119.3 : 
MERS-CoV, Genbank: JX869059.2 : MHV, Genbank: AY700211.1 ; HCoV 
NL63, Genbank: AY567487.2 : IBV, Genbank: M95169.1 . The figure was 
generated using the program ESPript (Gouet et al., 1999). 
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Table 1 

Structural information on CoV Nsp3 domains and regions. 


Domain/region 

Res. no. * / MW* 

Method 

Coronavirus 

Reference 

Ubll 

1-112/12.6 

NMR 

SARS-CoV 

Serrano et al. (2007) 



NMR 

MHV 

Keane & Giedroc (2013) 

Acidic domain 

(HVR) 

113-183/8.3 

n. d. 



PL1 Pro * 1 

n. a./23.6 

X-ray 

TGEV 

Wojdyla et al. (2010) 

Macl (X domain) 

184-365/ 19.5 

X-ray 

SARS-CoV 

Saikatendu et al. (2005) 



X-ray 

SARS-CoV 

Egloff et al. (2006) 



X-ray 

HCoV-229E, IBV 

Xu et al. (2009) 



X-ray 

HCoV-229E, IBV 

Piotrowski et al. (2009) 



X-ray 

FCoV 

Wojdyla et al. (2009) 



X-ray 

MERS-CoV 

Cho et al. (2016) 

Mac2 (SUD-N) 

389-524/ 15.2 

X-ray 

SARS-CoV* 2 

Tan et al. (2009) 

Mac3 (SUD-M) 

525-652/ 14.0 

NMR 

SARS-CoV 

Chatterjee et al. (2009) 



X-ray 

SARS-CoV* 2 

Tan et al. (2009) 



NMR 

SARS-CoV* 3 

Johnson et al. (2010) 

DPUP (SUD-C) 

653-720 / 7.8 

NMR 

SARS-CoV 

Johnson et al. (2010) 



X-ray 

MHV* 4 

Chen et al. (2015) 



NMR 

HKU9 

Hammond et al. (2017) 

Ubl2-PL2 pro 

723-1036/35.2 

X-ray 

SARS-CoV 

Ratia et al. (2006) 



X-ray 

SARS-CoV+human Ub 

Chou et al. (2014) 



X-ray 

SARS-CoV+human Ub 

Ratia et al. (2014) 



X-ray 

SARS-CoV + diUb 

Bekes et al. (2016) 



X-ray 

SARS-CoV + hlSG15* 5 

Daczkowski et al. (2017) 



X-ray 

SARS-CoV+ mlSG15* 6 

Daczkowski et al. (2017) 



X-ray 

MERS-CoV 

Lei et al. (2014) 



X-ray 

MERS-CoV 

Lee et al. (2015) 



X-ray 

MERS-CoV+human Ub 

Bailey-Elkin et al. (2014) 



X-ray 

MERS-CoV+human Ub 

Lei & Hilgenfeld (2016) 

pi_2P ro 

n. a./28.6 

X-ray 

MERS-CoV 

Clasman et al. (2017) 

Ubl2-PL2 pro 


X-ray 

IBV 

Kong et al. (2015) 

Ubl2-PL2 pro 


X-ray 

MHV* 4 

Chen et al. (2015) 

NAB 

1066-1180/ 13.0 

NMR 

SARS-CoV 

Serrano et al. (2009) 

pSM (G2M) 

1203-1318/ 12.5 

n. d. 



TM1 

1391-1413+ / 2.4 

n. d. 



3Ecto 

1414-1495/9.0 

n. d. 



TM2 

1496-1518+/ 2.7 

n. d. 



AH1 

1523-1545+/2.7 

n. d. 



Y1 + CoV-Y 

1546-1922/41.9 

n. d. 
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#: Nsp3 of the SARS-CoV strain TOR2 ( Genbank: AY274119.3 ); %: molecular mass (kD); n. d.: structure is not 
determined; *': absent in SARS-CoV; n. a.: does not apply (residue numbers are only given for SARS-CoV); * 2 : 
Mac2-Mac3 structure; * 3 : Mac3-DPUP structure; * 4 : DPUP-Ubl2-PL2 pro structure; * 5 : Ubl2-PL2 pro -C terminal Ubl 
domain of human ISG15 structure; * 6 : Ubl2-PL2 pro -C terminal Ubl domain of mouse ISG15 structure; f: regions are 
predicted by TMHMM server v. 2.0 (Krogh et al., 2001). TM1 and TM2 are transmembrane regions while AH1 is not 
(Oostra et al., 2008). 
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Table 2. 

Cleavage sites of PLl p10 and PL2 p10 in CoVs and the P5-P2' residues for each cleavage 
site. 



NsplJ,2 

Nsp2j3 

Nsp3J,4 

Reference 

TGEV 

RTGRG AI 

NKMGG GD 

PKSGS GF 

Putics et al., (2006) 

n. d. 

PLi pro 

n. d. 

HCoV NL63 

GHGAGSV 

TKLAG GK 

AKQGA GF 

Chen et al. (2007) 

PLl pro 

PL2 pro 

PL2 pro 

HCoV 229E 

KRGGG NV 

TKAAG GK 

AKQGA GD 

Ziebuhr et al. (2007) 

PLl pro > PL2 pro 

PLl pro < PL2 pro 

n. d. 

MHV 

KGYRG VK 

RFPCA GK 

SLKGG AV 

Bonilla et al. (1997); 

Kanjanahaluethai & Baker (2000) 

PLl pro 

PLi pro 

PL2 pro 

SARS-CoV #1 

ELNGG AV 

RLKGG AP 

SLKGG KI 

Harcourt et al. (2004) 

PL2 pro 

PL2 pro 

PL2 pro 

MERS-CoV #1 

KLIGG DV 

RLKGG AP 

KIVGG AP 

Yang et al. (2014) 

PL2 pro 

PL2 pro 

PL2 pro 

IBV #2 

/ 

VCKAG GK 

EKKAG GI 

Lim et al. (2000) 

/ 

PL2 pro 

PL2 pro 


n. d., not determined; #1 : absence of PLl pro ; . partial presence of PLl pro ; /: absence of the cleavage site. 


76 























ACCEPTED MANUSCRIPT 


A 


5'UTR x 


/ 


/ 


/ 


/ 


/ 


/ 




/ \ 




3TJTR 


alpha-CoV 


Ubll HVR PL1 P ro 


MacI 


T T 

Ubl2 PL2 P ro m 3Ecto m Y1 & CoV-Y 
1 2 


B 



beta-CoV: dade B 


Ubl2 

PL2P™ 

NAB 

PSM 

T 

M 

1 

3Ecto 

T 

M 

2 
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Lumen 



Structure unknown 


Nsp4 binding. 



1414 


Cytoplasm 




de-MARylation; 
de-PARylation; 
ADPr binding; 

ssRNA binding; 
nucleocapsid binding. 


527 653 723 783 

ssRNA binding; 
metal-ion binding? 




TM 
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1496 
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Nonstructural protein 3 (-200 kD) is a multifunctional protein comprising up to 16 
different domains and regions. 

Nsp3 binds to viral RNA, nucleocapsid protein, as well as other viral proteins, and 
participates in polyprotein processing. 

Through its de-ADP-ribosylating, de-ubiquitinating, and de-ISGylating activities, Nsp3 
counteracts host innate immunity. 

Structural data are available for the N-terminal two thirds of Nsp3, but domains in the 
remainder are poorly characterized. 

The papain-like protease of Nsp3 is an established target for new antivirals. 



